Meridian Design Doc 6:

Introduction

MERidian stands for Measure, Evaluate, Reward, the three steps of the impact evaluator framework. The Meridian project aims to create an impact evaluator for “off-chain” networks. I.e. a network of nodes that do not maintain a shared ledger or blockchain of transactions.

This doc proposes a framework and model for Meridian that will cater initially for both the Saturn payouts system and the SPARK Station module. For a bonus point, it should be able to cater for any Station module. We believe that trying to generalise beyond these few use cases at this point may be counterproductive.

We will structure this design doc based on the three steps of an Impact Evaluator (IE), measure, evaluate and reward.

Overview

Setup

sequenceDiagram
  autonumber
  participant C as Client
  participant I as Contract: IE
  participant P as Peer
  par Funding
    loop
			C->>I: Fund work
    end
  and Work
    loop
	    P->>P: Perform work
    end
  and Impact Evaluator
    loop
      I->>P: Measure
      I->>I: Evaluate
      I->>P: Reward
    end
  end

Generalizability

What can Meridian implementors re-use?

  • Impact evaluator smart contract
  • measure service
  • evaluate service scaffolding

What do Meridian implementors have to provide?

  • Peer
  • Deployments of measure and evaluate services
  • evaluate service business logic (fraud detection, evaluation)
flowchart TD
  MS[Meridian Measure Service]
  P[*Peer] --measurement--> MS
  MS --commitment--> IE[Meridian IE Contract]
  subgraph ES [Meridian Evaluate Service]
    E[*Evaluate function]
  end
  ES--evaluation-->IE
  MS--measurement-->ES
  IE--rewards-->P

Measure

sequenceDiagram
  autonumber
  participant P as Peer
  participant M as Service: Measure
  participant I as Contract: IE
  participant E as Service: Evaluate
  par Measure
    loop
	    alt Peer uses optional measurement service
	      P->>M: Upload measurements
	      activate M
	      M->>M: Store measurements
	      M->>M: Aggregate measurements
        M->>M: Expose via IPFS
		    M->>I: addMeasurements(measurementsCID)
	      deactivate M
	    else Peer self-submits
				P->>P: Aggregate measures
				P->>P: Expose via IPFS
	      P->>I: addMeasurements(measurementsCID)
	    end
      activate I
	    I->>I: Store measurements
	    I->>I: Emit Measurement event
	    I->>I: Maybe advance round
      deactivate I
    end
	and Evaluate: Preprocess
    loop
	    E->>I: Await Measurement event
      E->>E: Fetch measures via IPFS
	    E->>E: Detect fraud
	    E->>E: Aggregate
	    E->>E: Store aggregates
    end
  end
  

Evaluate

sequenceDiagram
  autonumber
  participant I as Contract: IE
  participant E as Service: Evaluate
  I->>I: Advance round
  E->>I: Await Round event
  E->>I: getRound(round-1)
  E->>E: Await all measurements have been pre-processed
  E->>E: Fetch aggregates
  E->>E: Calculate reward shares
  E->>I: setScores(round, scores, summary)
  I->>I: Store scores
  

Reward

sequenceDiagram
  autonumber
  participant I as Contract: IE
  participant P as Peer
  I->>P: Send FIL
  

Decentralization of services

It is of note that while in addition to smart contracts some services are being used, the design is decentralized:

  • The measure service is optional, and is hosted for peer convenience. Peers can decide not to use it, or run their own. Every peer is able to submit measurements themselves, but they need to take care of aggregating, pinning and gas cost.
  • The evaluate service’s outputs will be reproduced by one or more parties, to verify correctness.

Implementations

SPARK x Meridian

SPARK currently requires centralized tasking, in order to implement fraud detection. proposes a decentralized tasker.

Whether a centralized tasker is a deviation from the impact evaluator architecture isn’t clear. On one hand, tasking might not be considered part of the IE. On the other hand, without the centralized tasker, peers won’t perform any work, and there is no impact to evaluate.

Saturn x Meridian

Measurement

  • Saturn doesn’t want to expose the raw measurement logs to the world
  • The IPF interface for transmitting measurements to the evaluate preprocess service might not work for Saturn’s traffic amounts

Evaluate

  • Could we get another trusted party to run the evaluation pipeline, so that there’s not a single responsible party (making it centralized)?

Impact Evaluator

Implementation

See also

Round logic

Rounds are advanced based on configurable block number increments. The trigger happens inside the IE’s addMeasurement() function, as it is called frequently during the round.

Round lengths

Round length should be configurable without major scaling considerations, as the Evaluate: preprocess step takes care of pre-aggregating measurements. Likely more significant factors determining round length are peer incentives / feedback loop and gas cost. See also .

It is up to the Meridian implementer to decide how long their rounds shall be. Round lengths can also be adjusted while the IE is running. Adjustments will become effective at the start of next round.

Alternatives considered

  • Advance after time. Similar to advancing based on block numbers, but harder to get right because it relies on clocks
  • Advance after measurements. Harder to control, however easier to scale / operate.

Indexing

Rounds can indexed by their start block number and static index in the rounds array.

Payouts and funding

Per round a fixed amount of FIL will be distributed among peers. While this isn’t as incentivizing as allowing over-performers to claim more than anticipated, it helps control cost and fraud detection risk.

In order to incentivize peers to contribute to the system, the contract should be pre-charged a couple rounds in advance. This also helps decouple the IE from the team running it, as peers can be confident there will be rewards even if the team stops operating.

Measure

See describing components and flow

In the measure step, we refer to each atomic item that gets measured as a job. For example, each retrieval served by a Saturn node is a job. For Spark, each retrieval made from an SP is a job.

This step is implemented using 2 components:

  • An opt-in measure service
  • The impact evaluator smart contract

If opting in to use a hosted measure service, peers periodically submit measurements (job logs) to it, which commits batches to the impact evaluator smart contract, and exposes them over IPFS for the evaluate step.

If not opting in to using the measure service, it’s the peer’s responsibility to aggregate measurements, submit to the smart contract, and expose via IPFS.

Measurements need to be retrievable via IPFS until rewards have been paid for the round they were submitted in.

Since all measurements are publicly retrievable, the measure service doesn’t need to provide any proofs (of inclusion). If peers find a measure service to misbehave, they can use another, deploy their own, or submit directly.

Data Model

Measurements

The measure service (or peers directly) periodically submits CIDs to the impact evaluator smart contract.

  • How to store large amounts of JSON blobs efficiently? UnixFS or just one node with a large string?
// For readability, these JSON objects have been pretty printed

// Generalized record
{
    "job_id": "<UUID or CID>",     // Unique job id
    "peer_id": "<Libp2p Peer ID>", // Who completed the job
    "started_at": "Timestamp",     // When did the job begin
    "ended_at": "Timestamp",       // When did the job end
    // Any other fields that are useful measurements of work done
}

// Example Saturn record
{
    "job_id": "abcdef",
    "peer_id": "<Libp2p Peer ID>",
    "started_at": "2023-05-01 00:52:57.62+00",
    "ended_at": "2023-05-01 00:52:58.62+00",
    "num_bytes_sent": 240,
    "request_duration_sec": 10,
    "ttfb_ms": 35,
    "status_code": 200,
    "cache_hit": true
}

// Example SPARK record
{
    "job_id": "abcdef",
    "peer_id": "<Libp2p Peer ID>",
    "started_at": "2023-05-01 00:52:57.62+00",
    "ended_at": "2023-05-01 00:52:58.62+00",
    "status_code": 200,
    "signature_chain": "<signature chain>",
    "num_bytes": 200,
    "ttfb_ms": 45 
}

Implementation

GitHub - filecoin-station/meridian-measure-service: Boilerplate for a Meridian _measure_ service
Boilerplate for a Meridian _measure_ service. Contribute to filecoin-station/meridian-measure-service development by creating an account on GitHub.
https://github.com/filecoin-station/meridian-measure-service/tree/main

Evaluate

At this point we have an array of CIDs stored in the impact evaluator’s current round, pointing at measurements exposed via IPFS. The next step is to evaluate over a round’s measurements, using the evaluation function.

Evaluation Function

In general, for tt evaluation fields, for node ii where 1<=i<=n1 <= i <= n, and with logs li,1,...,li,j,...li,mil_{i,1},...,l_{i,j},...l_{i,m_i} and evaluations on those logs xli,1,1,...,xli,1,t,...,xli,mi,1,...,xli,mi,tx_{l_{i,1},1},...,x_{l_{i,1},t},..., x_{l_{i,m_i},1},...,x_{l_{i,m_i},t} and evaluation function ff, we can calculate the evaluation output as

y:=f(xl1,1,1,...,xl1,1,t,...,xl1,m1,1,...,xl1,m1,t,...,xln,1,1,...,xln,1,t,...,xln,mn,1,...,xln,mn,t)y := f(x_{l_{1,1},1},...,x_{l_{1,1},t},..., x_{l_{1,m_1},1},...,x_{l_{1,m_1},t}, ...,x_{l_{n,1},1},...,x_{l_{n,1},t},..., x_{l_{n,m_n},1},...,x_{l_{n,m_n},t})

where y=(y1,...yn)y = (y_1,... y_n) and yky_kyiy_i is the evaluation of node kk.

In the case of Saturn, the evaluation function is a function of number of bytes sent, TTFB and the request duration. This is calculated by the Saturn payouts system. See https://hackmd.io/@cryptoecon/saturn-aliens/%2FMqxcRhVdSi2txAKW7pCh5Q for more details.

In the case of Spark, the evaluation function is simply a count of the number of successful requests with valid signature chains a Station has performed. Specifically, for node kk,

yk=j=1mkδkji=1nj=1miδijy_k = \frac {\sum_{j=1}^{m_k} \delta_{kj} } {\sum_{i=1}^n\sum_{j=1}^{m_i} \delta_{ij}}

where δij=1\delta_{ij} = 1 if the log with index jj of node ii is valid and 00 otherwise.

Reproducibility

  • In order to decentralize the Evaluate service, it can be run by multiple parties. My first thought was to let the PL-run Evaluate service submit evaluations to chain. The other services can then publish their results off-chain, to confirm that the evaluations are correct. This has the downside that the contract only trusts one party, and aside from reputation the other parties can’t change anything about the contract’s operation.
    • Another idea would be to let all evaluating parties submit their results to the contract. Only once a certain quorum of equal results is reached, the contract will trigger the rewards phase. This seems more decentralized, but still has the downside that a conflict needs to be resolved by a PL admin

Multi stage evaluation

In Meridian, evaluation is a 2 stage process. Evaluation stage i is a data preprocessing pipeline, that periodically pre-filters and aggregates measurement results. This smaller dataset is then consumed by evaluation stage ii, which is executed once before the rewards phase.

The 2 stage design is one of the lessons from the Saturn project: Data needs to be aggregated and pre-filtered, as otherwise e.g. a once-a-month evaluation run will operate over too large a dataset and pose serious scaling issues.

Conveniently, fraud detection is also a 2 stage process, and each evaluation stage comes with one fraud detection stage.

Evaluation Stage I: Data preprocessing

See describing components and flow

The data preprocessing pipeline is executed whenever a measure CID has been committed to the impact evaluator smart contract. The pipeline retrieves raw measurements via IPFS, performs its preprocessing steps, and finally stores results in its internal data store.

Fraud Detection

Based on the network-specific fraud detection function, measurements are aggregated into two buckets:

  • Honest measurements: Data used for later processing in evaluation stage ii and reward
  • Fraudulent measurements: Data kept for reference

The fraud detection function maps an individual measurement to boolean fraudulent status:

fraudulenti=fraudDetection(measurementi)fraudulent_i = fraudDetection(measurement_i)
flowchart TD
  subgraph Measurements
	  M1[Measurement]
	  M2[Measurement]
	  M3[Measurement]
  end
  subgraph Buckets
    BH[Honest]
    BF[Fraudulent]
  end
  Measurements--detect fraud-->Buckets
  M1[Measurement] --> BH
  M2[Measurement] --> BH
  M3[Measurement] --> BF
💡
Some fraud detection functions depend on attackers not knowing the function, and therefore must stay private. See

Aggregation

Measurements from both buckets will be aggregated, and those aggregations stored in internal data storage.

However, only measurements from the Honest bucket will count when evaluation stage ii determines the peer’s impact on the system.

flowchart TD
  subgraph Aggregates
    AH[Honest]
    AF[Fraudulent]
  end
  subgraph Database
    TH[Honest]
    TF[Fraudulent]
  end
  AH --> TH
  AF --> TF
  AH --> Evaluation

Evaluation Stage II

See describing components and flow.

After each round, the evaluate stage ii service converts preprocessed aggregated measurements (from the Honest bucket) into evaluation results.

It also executes a 2nd round of fraud filtering.

By committing scores to the impact evaluator smart contract, evaluations are committed on chain and the reward phase will begin.

The evaluation process runs off chain, because the dataset (all aggregated measurements produced by evaluation stage i) is too large to be handled by smart contracts.

Fraud detection

Multiple processes can mark peers or logs as fraudulent, in between measure and evaluate. For example, the Saturn Orchestrator can mark a peer as fraudulent when it fakes its speed test results.

Therefore, all measurements that are part of the Honest buckets but have later on been flagged as fraudulent (or associated with a peer that has been flagged) will not be fed into the evaluation function.

💡
Some fraud detection functions depend on attackers not knowing the function, and must stay private. See

Data Model

At the end of evaluation stage ii data of following shape is committed on chain by calling the setScores function:

  • round: The index of the evaluated round
  • addresses: The addresses of the peers involved in the round
  • scores: The impact scores for each peer involved in the round, which maps to a share of the round’s reward pool

Reward

In the reward step, the impact evaluator smart contract directly sends peers their reward share, as determined by the previous step.

sequenceDiagram
  autonumber
  participant I as Contract: IE
  participant P as Peer
  I->>P: Send FIL
  

Push vs pull payments

Pull payments have advantages:

  • The system doesn’t have to pay the transaction gas fees
  • Peers have the freedom to claim whenever they want (e.g. tax benefits), or not to claim at all
  • They are potentially less coupled (legally) to the party funding the payments

However, we decided to go with push payments, for these reasons:

  • They allow us to change round length freely. With short rounds, peers claiming rewards for each individual round would be cumbersome
  • The smart contract logic is significantly easier
  • Clients (namely Filecoin Station) don’t need to implement claiming functionality
  • Clients don’t need to have an existing balance (for paying gas fees) in order to receive rewards

Smart Contracts

Depending on how complex contracts turn out to be, hire contractors or write ourselves. Our current thinking is that contract work will be simple enough (either because contracts are simple or existing contracts can be reused), that we would prefer to write the contracts ourselves. This puts is more in control, and alleviates timelines / cross team orchestration.

Independent of which team creates the contracts, audits will be required anyway.

Specs

Testing

Testing will depend on the choice of framework we use to develop the smart contracts. The recommendation is that we proceed with foundry because it has built in invariance testing and auto generates rust bindings for the contracts which are useful for integration tests.

Smart Contract Testing

  • We can write some unit tests in solidity that test basic contract functions such as claiming a payout, etc. With foundry we can write these tests in solidity.
  • Invariance testing. This is built in foundry or can be done with a separate library such as echidna. This implements fuzz testing to the contract as a whole.
  • Static analysis. There are already established analysis tools for EVM smart contracts and we should use them.

One caveat with testing FVM smart contracts is that if we want to use filecoin specific features (eg. Filecoin addresses) in our contracts then we would be relying on using filecoin pre-compiles and that will break a lot of testing libraries. A naive solution could be to maintain two versions of each contract. The Saturn team also started working on a local FVM test executor written in Rust that allows to run unit tests on solidity smart contracts that use filecoin precompiles. This executor is still rudimentary and needs improvement to be a reliable testing tool.

Unit Tests

Each component should have unit tests to make sure functionality is working. For example we should have extensive unit tests for: log commitment scheme, evaluation functions, etc.

Integration Tests

If we have bindings for the contracts, we can easily write integration tests for some end to end flows that run on calibration net. This just requires a burner wallet with some test fil in it. Saturn already has examples of this.

Auditing

After we complete our smart contracts, we should have them audited and publish the audit publicly.

Observability

Take inspiration from the Saturn internal dashboard.

Create a generalized dashboard template for all Meridian systems.

SPARK x Meridian roadmap

SPARK will be the first Meridian implementation. The use case of SPARK will be used to create the reusable infrastructure (services & smart contracts) that Meridian will offer to future implementors.

Quality criteria

  • For each service or smart contract
    • Testsuite
    • testnet deployment
    • mainnet deployment
  • For each smart contract
    • Audit
    • Static analysis
  • For each service
    • Observability through Sentry & Grafana
No walking skeleton

While it is a popular pattern (and one liked by the team) to first create a walking skeleton with all the components of a software system in it’s most basic form talking to each other, it is not a great fit for Meridian.

  • The sooner the finished measure step is deployed, the sooner we will collect real data that will later be consumed by the following steps. Therefore, developing the steps in parallel will shift the timeline unfavourably
  • The steps have clear boundaries with well enough defined Interfaces, thanks to the theoretical foundation of the Impact Evaluator Framework
  • A deployed measure step can already collect data, while a deployed yet unfinished evaluate step shouldn’t start evaluating them
  • Sequential flow of development fits sequential flow of system

The team is therefore going to implement measure, evaluate and reward in the traditional waterfall model. One could also argue that the result of this document’s roadmap will be just this walking skeleton.

ItemDRINotes
Interface with legalPM
Business model explorationPM
Boost interfaceEng Lead
Smart contract contractorsPMContractor?
Smart contract auditorsPMContractor
Measure Eng workEng Lead
Evaluate Eng workEng Lead
Reward Eng workEng Lead
Meridian WebsitePMContractor
Update Station WebsitePM + ContractorContractor
Lab week planningPM
SwagPM
Product market fit workPM

Notes 2023-08-03

Need a smart contract that is running the whole loop

Want it to operate as close as you can to how block rewards operate

Structure on chain that nobody controls

Entity that nobody controls that is on chain, that runs the process of rewarding

Look at Juan’s workshops into how contract structure should work

Taking blockchain reward model and not change it at all

Overall smart contract for the IE

Per round

Sampling steps into the SP to find a CID

Orchestrator should sample CIDs

There is no list of CIDs anywhere

Flesh out the e2e structure of Meridian and deploy a version of it, even if measurement and rewarding isn’t as good as it should be.

For small amounts of reward, it wont be worth doing fraud

Honest vs fraudulent log classifier

Block reward model has a concrete time structure

Next steps

  • Clean up document
  • Start document: Round lengths
  • Start document: IE PaaS deployment